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A new procedure is presented, which allows, based on Kendall's r, to test for partial 
correlation in the presence of censored data. Further, a significance level can be assigned 
to the partial correlation - a problem which hasn't been addressed in the past, even for 
uncensored data. The results of various tests with simulated data are reported. Finally, 
we apply this newly developed methodology to estimate the influence of selection effects 
on the correlation between the soft X-ray luminosity and both total and core radio 
luminosity in a complete sample of Active Galactic Nuclei. 
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INTRODUCTION 



^■rstronomers are frequently confronted with the problem of 
(gpssing or incomplete information. This typically happens 
^|rien a sample of sources, which has been selected for show- 
ragj emission in a certain waveband, is then observed in an- 
'oBier part of the electromagnetic spectrum. A lack of in- 
trinsic emission, absorption due to intervening material or 
insufficient sensitivity of the instrument then often result in 
l^B^ er limits or, more general, in a 'censored' data set. 

^2 In our specific case a sample of Active Galactic Nu- 
clei (AGN) with 2.7 GHz fluxes greater than 2 Jy has been 
established for which almost complete information on the 
radio and the optical continuum as well as line emission ex- 
ists (Morganti et al. 1993, Tadhunter et al. 1993, di Serego 
Alighieri et al. 1994). The soft X-ray properties of these 
objects were determined by using the flux limited ROSAT 
All-Sky Survey (Siebert et al. 1995). For about 40% of the 
objects in this sample only an upper limit on the soft X-ray 
flux could be given. 

One probable clue to the radiation mechanisms in AGN 
is to search for a relationship between the emission from 
different wavebands. Many attempts have been made in the 
past to investigate the correlations between the radio, the 
optical and the X-ray regime (e.g. Feigelson & Berg 1983, 
Fabbiano et al. 1984, Zamorani 1984, Kembhavi et al. 1986, 
Wilkes & Elvis 1987, Browne & Murphy 1987). Thus, given 
the above mentioned problems, many procedures have been 
developed by astrophysicists to deal with the problem of cor- 
relation and regression analysis with censored data (Schmitt 
1985, Feigelson & Nelson 1985, Isobe et al. 1986, Avni & 
Tananbaum 1986). 



By applying regression and correlation analysis to the 
radio and soft X-ray continuum emission of the above men- 
tioned complete sample including the upper limits (using 
ASURV Rev 1.3, La Valley et al. 1992, Feigelson & Nelson 
1985, Isobe et al. 1986), correlations of the soft X-ray lumi- 
nosity with both the total and the radio core luminosity were 
found (Siebert et al. 1995). The use of luminosities instead 
of fluxes, however, always introduces a redshift bias to the 
data, as luminosities are strongly correlated with redshift in 
flux limited samples. It is therefore crucial to estimate the 
influence of this effect on the correlations in order to be able 
to draw reliable conclusions on the true physical relationship 
between the emission from the two wavebands. Partial corre- 
lation coefficients have been used to deal with this problem 
(e.g. Kembhavi et al. 1986). However, up to now, censored 
data could not be taken into account. 

In this paper we want to present a method that allows 
to apply partial correlation to censored data and to assign 
a significance level to the resulting correlation coefficient. 
The structure of the paper is as follows: after introducing 
the notation and partial Kendall's r coefficient (§2.1), this 
concept will be extended to censored data (§2.2). In §3 we 
describe the various tests we applied and report numerical 
results on both simulated and 'real' data. 

We note that our method is based on rank correlation 
coefficients. Rank correlation analysis is more general than 
the frequently used linear correlation analysis and thus our 
method is also applicable to linear correlation coefficients. 

This procedure resulted from a interdisciplinary col- 
laboration of astrophysics and mathematical statistics 
in the form of the newly founded Statistical Con- 
sulting Center for Astronomy (SCCA). Further infor- 
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mation can be obtained through World Wide Web 
(http://www.stat.psu.edu/scca/homepage.html), or by con- 
tacting SCCA@stat.psu.edu. The computer code developed 
on the basis of the procedure presented in this paper is also 
available from this site. 



2 PARTIAL KENDALL'S r COEFFICIENT 
WITH CENSORED DATA 

In this section we give a description of the partial Kendal- 
l's r coefficient with censored data and describe a proce- 
dure for testing the hypothesis that the population partial 
Kendall's r is zero. In the first subsection we give a brief 
introduction and background references for Kendall's rank 
correlation coefficient and Kendall's partial rank correlation 
coefficient with uncensored data. The procedure for censored 
data is given in subsection 2.2. 

2.1 Introduction and Background 

In this subsection we consider the uncensored case. Let T = 
(T\,T2,Ts) be the random vector of interest, and let T 8 = 
(Ti,,T2i,T3,), i = l,...,n, be the sample values. For k = 
1, 2, 3, set 

■h(i,j) = I (Tki <Tkj) — I(Tkj <Tki), 

where I(x < y) = 1, if x < y and otherwise. Kendal- 
l's (1938) rank correlation coefficient between Tk and T is 
defined by 

T ki = E(J k (i,j)Ji(i,j)), k / I, 
and its sample estimate by 

T k i = , 2 r , V" J k (i,j)Ji(i,j). 
n(n — 1) — ' 

It has been shown that r can be extended to the case of par- 
tial correlation and that the partial r has the same structural 
form as P12.3, the Pearson's partial product-moment corre- 
lation (Kendall 1970). In particular, Kendall's partial rank 
correlation coefficient between 7\ and T2 given T3 is defined 
as 

T12 — T13T23 

T~12. 3 — T7T- 

[(i-^Hi-^)] 172 

For a general discussion of the problem of measuring partial 
association see Quade (1974). A geometric interpretation of 
partial correlation is given in Thomas & O'Quigley (1993). 
In spite of the long history of Kendall's partial rank correla- 
tion coefficient, there are no tests for the significance of the 
partial r (Hettmansperger 1984). See also Nelson & Yang 
(1988) where they study, via Monte Carlo, the performance 
of the Jackknife approximation to the distribution of T12.3. 
A useful discussion on the interpretation of Kendall's par- 
tial rank correlation coefficient can also be found in Nelson 
& Yang (1988). 

2.2 Extension to Censored Data 

The extension of Kendall's r to censored data was first given 
by Brown, Hollander & Korwar (1974) in a biostatistical 



context. A more careful derivation of its distributional prop- 
erties was given by Oakes (1982). After introducing some 
notation, we describe this censored data version of Kendal- 
l's r. The partial r is then defined in terms of r as in the 
uncensored case. Then we describe a method for testing the 
significance of the partial r. To our knowledge, this method 
is new even with uncensored data, since Macklin (1982) only 
verified by computer simulations that the asymptotic dis- 
tribution of Spearman's partial p has, under the null hy- 
pothesis, the same form as the asymptotic distribution of 
Spearman's p. 

Let again T = (7\ , T2 , T3 ) be the random vector 
of interest. However, due to censoring we only observe 
{Xu,&u, X 2 ,,&2t, X?_, t ,&?_, t ), i = l,...,n, where, for k = 
1,2,3, Xkt = min{Tk,,Ckt}, Ski = I{Tk, < Ckt) where 
Cki is the censoring variable and 1(A) is the indicator of the 
event A. 

At this point we have to emphasize that the above is 
the right censoring model common in Biostatistics. In As- 
tronomy the data are generally left censored. Left censoring, 
however, can be converted to right censoring by multiplying 
all data points by —1. (If the log of the data is being an- 
alyzed, multiplication by —1 should take place after taking 
logs.) With this conversion, Cki represents minus (the log 
of) the detection limit for the k — th coordinate of the i — th 
observation, Tki is minus (the log of) the k — th coordinate of 
the i — th observation, and if Ski = 1 then what is observed 
(i.e. Xki) is the variable of interest, while if Ski = then 
only the detection limit was recorded. 

The censored data version of the function J becomes 

Jk(i,j) = SkiI(Xki < Xk 3 ) — SkjI(Xkj < Xki). 
For k, 1 = 1, 2, 3, set 
hki(i,j) = Jk(i,j)Ji(i,j). 

In this notation, the censored data version of Kendall's r 
between Tk and T is 

Tki = , 2 y~]h k i(i,j), 
n(n — 1) £ — ' 

i<3 

and the censored data version of the partial Kendall's r be- 
tween Ti and T2 given T3 is 

T12 — T13T23 

T~12. 3 — T7T- 

[(l-f 1 2 3 )(l-f| 3 )] 1 / 2 

Under the null hypothesis Ho that the partial Kendall's r is 
zero, the above statistic is asymptotically normal with zero 
mean and estimated variance (see also appendix) 

~2 -1 A n 

where 

n 

An = (n-l)- l Y,( B n-B) 2 , (1) 
n = i 

where 

B n = 7 - , 6 , -r y~] g(n, ji, 12, 32), 

(n — l)(n — 2)(n — 3) £ — ' 

ji < 12 < J2 

all 7^ ii 

B is the average of the B 8 's, and 



Partial correlation test 3 



(«1 , Jl , «2 , J2 ) = — ^ <?(«1 , Jl , 12 , J2 ) 

V 

where ~^2 p denotes summation over all permutations of 
(«i, Ji, 12, 32) and 

<?(«i, Ji, 12, 32) = h 12 (n,ji) - h 13 (n, ]i)h 2 ?_,{i2, 32)- 

The hypothesis of zero partial correlation coefficient is re- 
jected at level a if 



where z a j 2 denotes the f00(f — a/2)-th percentile of the 
standard normal distribution. 



3 NUMERICAL RESULTS AND DATA 
ANALYSIS 

3.1 Simulations 

The testing procedure described in Section 2 is based on the 
asymptotic (i.e. 'large' sample size) normality of the partial 
r. In practice, however, we often have to deal with small or 
moderate sample sizes. Thus it is useful to have some un- 
derstanding of the performance of the procedure under such 
settings. Two important performance characteristics of any 
testing procedure are the attained level and the power of the 
procedure. With finite samples, the attained level will not 
be exactly equal to the chosen a because the small-sample 
distribution of the test statistic is not exactly normal. The 
power of a testing procedure is the probability that the pro- 
cedure will reject the null hypothesis when it is not true. 
Clearly, the more pronounced the departure from the null 
hypothesis, the greater the power. 

Both the attained level and the power of a testing proce- 
dure against selected alternatives can be evaluated via sim- 
ulation studies using artificially generated data sets. For the 
simulation results reported we used sample size n = 30, 
and a = 0.05. Under the null hypothesis (i.e. zero par- 
tial correlation) the data sets were generated as follows: 
Ti,, T21, T3, are all independent exponential random vari- 
ables with mean one; since all variables are generated inde- 
pendently, the partial correlation coefficient between Ti and 
T2 given T3 is zero. The censoring variables Cn, C21, C3, 
are independent exponential random variables with mean 
four. This gives a theoretical level of censoring of 20% for 
all three variables. The statistic was based on the data 
X kt = min{T kt ,C kt }, S kt = I(T kt < C kt ), k = 1,2,3, 
i = 1, ...,30. From 1000 simulated data sets the null hy- 
pothesis was rejected 71 times. Next, in order to get an 
idea of how sensitive the test is to departures from the null 
hypothesis, random samples were generated with nonzero 
partial correlation. Four levels of departure from the null 
hypothesis were considered. For all levels the variable T3, 
and all the censoring variables were generated as before. 
Variables Ti, and T21 were generated as follows: For the 
first level, T u = 0.87;* + 0.2T 4m T 2t = 0.8T 2 * + 0.2T 4m 
where T* t , T 2 * , T4, are all independent exponential random 
variables (and independent from T3,) with mean one. Thus 
Tij, T21 are dependent due to the presence of the common 
T4, and this dependence is the same when the independent 
T3, is held fixed. For the second level, Ti, = 0.6T* t + OAT4,, 



T21 = 0.6T 2 * + OAT4,. For the third and fourth levels the 
coefficients become 0.4, 0.6, and 0.2, 0.8 respectively. Thus, 
level one represents the smallest departure from the null hy- 
pothesis and level four represents the largest. In particular, 
the Pearson partial correlation for level one is 0.06, and for 
levels two, three and four, it becomes 0.31, 0.69, and 0.94 
respectively. From 1000 generated data sets from each of the 
four levels the null hypothesis was rejected 109, 345, 812, and 
1000 times, for levels one, two, three, and four, respectively. 

Recall that we chose a = 0.05. Thus, if the small-sample 
distribution of the test statistic is well approximated by its 
asymptotic distribution, the attained level of the test pro- 
cedure should be approximately 0.05 (i.e. it should reject 
about 50 times out of 1000 simulations under the null hy- 
pothesis). Large deviations from that indicate poor approxi- 
mation to the small-sample distribution. From the statistical 
point of view, the attained level of 0.071 is significantly dif- 
ferent from the chosen a = 0.05, (a 95% confidence interval 
for the attained level is (0.055,0.087)). From the practical 
point of view, however, the difference is not significant; in 
fact, for a sample of size 30 with 20% censoring, it is quite 
satisfactory. The power of the procedure is rather low for 
small departures from the null hypothesis but it increases 
very noticeably as the departures become more pronounced. 
For larger sample sizes, the attained level should be closer to 
0.05, and the power should be greater. To verify this we ran 
again the simulations changing only the sample size to 80. 
With this sample size, the attained level was 0.049, and the 
power against the four alternatives was 0.158, 0.746, 0.999, 
and 1.000. 



3.2 Application to astronomical data 

As an application of the procedure described in §2 to an 
astrophysical problem we further investigated the sample 
already discussed by Morganti et al. (1993), Tadhunter et 
al. (1993) and Siebert et al. (1995). In total it consists of 
88 sources (68 radio galaxies, 18 quasars, 2 BL Lac objects) 
which were selected from the Wall & Peacock 2.7 GHz sam- 
ple (Wall & Peacock 1985) of radio sources. The selection 
criteria were: redshift z < 0.7, radio flux density S2.7G.ffz > 2 
Jy and declination S < 10°. 

One of the key issues of the study was to investigate the 
relationship of the radio to the soft X-ray emission in the 
(0.1-2.4)keV ROSAT energy band. In Figures 1 and 2 we 
show a plot of the soft X-ray luminosity L x versus the total 
radio luminosity Lt and the core radio luminosity L c , re- 
spectively. Clearly, a correlation is visible in both diagrams. 
Indeed, the correlation and regression analysis using ASURV 
(La Valley et al. 1992, Feigelson & Nelson 1985, Isobe et al. 
1986) shows that the radio and the soft X-ray emission are 
correlated, both for the galaxies and the quasars, although 
the statistical significances of the correlations are low in the 
case of the quasars. This is probably due to the small sample 
size and the small range in luminosity. 

Because of the flux limit of the original Wall & Peacock 
radio catalog, Lt is strongly correlated with redshift. Fur- 
ther, the correlations of L x with L c and Lt are not mutually 
independent since L c is also correlated with Lt . In order to 
evaluate the influence of the individual redshift-luminosity 
correlations and the L c - Lt correlation on the correlations 
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Notes. Column (1): AGN class. Column (2): Number of objects in each class. Column (3), (4-): Indepen- 
dent^) and dependent (Y) variable respectively. The number of upper limits is given in the second line. 
Column ( 5): Kendall's r of the radio vs X-ray correlation with the corresponding probability that the cor- 
relation arises by chance given in the second line. Column ( 6): Partial Kendall's r with the effect of redshift 
excluded, together with the calculated variance (see §2). Column (7): Probability of erroneously rejecting 
the null hypothesis (i.e. no correlation). Column (8),(9): Same as in columns (6) and (7), but with the effect 
of the L c — Lt correlation taken into account. 
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Figure 1. Total rest frame 2.7GHz radio luminosity versus soft 
X— ray luminosity in the (0.1— 2.4)keV energy band. Full dots de- 
note quasars, whereas galaxies are plotted with open squares. 
Upper limits are indicated by arrows. 



Figure 2. Radio core luminosity at 2.7 GHz versus soft X— ray 
luminosity. Full dots denote quasars, whereas galaxies are plotted 
with open squares. Upper limits are indicated by arrows. 



with L x , we applied the procedure developed in §2 to this 
data set. 

In the case of the quasars, the L x - Lt correlation seems 
to be strongly affected by both the redshift bias and the 
L c - Lt correlation. It turns out that the correlation is no 
longer statistically significant once both selection effects are 
properly accounted for. The L x - L c correlation is much less 
affected and the probability of erroneously rejecting the null 
hypothesis of no correlation is ^ 4%. As we have shown in 
the previous section, the power of the statistical test depends 
on the sample size. Given the low number of quasars, an 
error probability of 4% is acceptable. We thus conclude that 
there is indeed a correlation between L x and L c for quasars 



and that the L x - Lt correlation is probably an artifact of 
the redshift bias and/or the strong relation of Lt with L c . 

The results for the radio galaxies are similar. The L x 
vs L c correlation remains highly significant in the partial 
correlation analysis, whereas we find evidence that the L x - 
Lt correlation is most likely introduced by the redshift bias. 

The fact that the L x - L c correlation is independent of 
redshift effects in both object classes is not surprising, since, 
because of the inclusion of upper limit values in the analysis, 
L x as well as L c do not depend a priori on redshift. 

For a discussion of the results with respect to unification 
schemes and physical emission processes, see Siebert et al. 
(1995). 
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4 SUMMARY 

In this paper we present a new methodology to test for par- 
tial association in censored (astronomical) data. This proce- 
dure is based on the Kendall's r statistic and allows for the 
first time to assign a significance level to the resulting par- 
tial correlation coefficient. Tests with simulated data show 
that the procedure gives reliable results, although the power 
of the statistical test also depends on the sample size. 

We applied the new method to a sample of f 8 quasars 
and 68 radio galaxies defined in Morganti et al. (f993) in or- 
der to investigate the influence of two selection effects on the 
observed correlation of L x with both Lt and L c , namely the 
strong correlations of Lt with redshift and with L c . Whereas 
we find evidence that the L x - Lt correlation is most likely 
an artifact of the redshift bias in both object classes, we con- 
clude that the L x - L c correlation is not affected by either 
of the selection effects in galaxies as well as in quasars. 
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APPENDIX A: MATHEMATICAL 
DERIVATIONS 

The idea is to express the numerator of Kendall's partial r as 
a [/-statistic and then use existing theory (Lee (1990); Ser- 
fling (1980)). We will use the notation introduced in Section 
2. Write 



7"12 — T13T23 

2 



n(n ■ 



n 2 (n 



» 2 (» - l) 2 E E [hi2(i,j)-h 1 3(i,j)h 2 3(H,ji)] 

'<J H<Jl 

n 2 (n - l) 2 E E ^ 



n(n — l)(n — 2)(n — 3) 



E * 



+o(-) 

n 



n(n — l)(n — 2)(n — 3) 



E * 



h Ji ili Jl) + 



+o(- 



24 



n(n — l)(n — 2)(n — 3) 



+o(-) 

n 



where O(^) denotes a quantity that when multiplied by n 
remains bounded as n — > 00. The first term on the right 
hand side is a [/-statistic, that has mean value zero under 
the null hypothesis. Thus, from Serfling (1980, p. 188) it 
follows that, under the null hypothesis, T12 — ^137^23 has the 
same asymptotic distribution as its 'projection' 



where the P, are independent and identically distributed 
random variables and are described in the preceding refer- 
ence. Thus its asymptotic variance is 16n _1 Var(Pi). The 
estimate of Var(Pi) given in (1) is the estimate proposed by 
Sen (1960) modified to increase the sensitivity of the testing 
procedure under the alternative hypothesis. 



